pcvrDDPSC Data Science Core, August 2023
Josh Sumner
pcvr GoalsgrowthSSfitGrowthgrowthPlottestGrowthpcvr GoalsCurrently pcvr aims to:
There is room for goals to evolve based on feedback and scientific needs.
Pre-work was to install R, Rstudio, and pcvr with dependencies.
plantCV allows for user friendly high throughput image based phenotyping
Resulting data follows individuals over time, which changes our statistical needs.
Longitudinal Data is:
Five model building options are supported through the type argument of growthSS:
nls, nlrq, nlme, mgcv, and brms
Other than mgcv all model builders can fit 9 types of growth models.
| “nls” | “nlrq” | “nlme” | “mgcv” | “brms” |
|---|---|---|---|---|
stats::nls |
quantreg::nlrq |
nlme::nlme |
mgcv::gam |
brms::brms |
Non-linear least squares regression.
| Longitudinal Trait | nls |
|---|---|
| Non-linearity | ✅ |
| Autocorrelation | ❌ |
| Heteroskedasticity | ❌ |
| Linear Regression | Quantile Regression |
|---|---|
| Predicts mean E(Y|X) | Predicts quantiles Q(Y|X) |
| Works with small N | Requires higher N |
| Assumes Normality | No distributional assumptions |
| E(Y|X) breaks with transformation | Q(Y|X) robust to transformation |
| Sensitive to outliers | Robust to outliers |
| Computationally cheap | Computationally more expensive |
Non-linear quantile regression.
| Longitudinal Trait | nlrq |
|---|---|
| Non-linearity | ✅ |
| Autocorrelation | ❌ |
| Heteroskedasticity | ✅ |
Non-linear Mixed Effect Models
| Longitudinal Trait | nlme |
|---|---|
| Non-linearity | ✅ |
| Autocorrelation* | ✅ |
| Heteroskedasticity | ✅ |
| Being a headache | ✅ |
General Additive Models Only
| Longitudinal Trait | gam |
|---|---|
| Non-linearity | ✅ |
| Autocorrelation | ❌ |
| Heteroskedasticity | ❌ |
| Unparameterized | ✅ |
Bayesian hierarchical Models
| Longitudinal Trait | brms |
|---|---|
| Non-linearity | ✅ |
| Autocorrelation | ✅ |
| Heteroskedasticity | ✅ |
There are 6 main growth models supported in pcvr.
3 are asymptotic, 3 are non-asympototic.
There are also two double sigmoid curves intended for use with recovery experiments.
Generalized Additive Models (GAMs) are also supported.
Survival models can also be specified using the “survival” keyword. These models can use the “survival” or “flexsurv” backends where distributions can be specified as model = "survival {distribution}".
For details please see the growthSS documentation.
m <- mgcv::gam(y ~ group + s(time, by =factor(group)), data=simdf)
start<-min(simdf$time); end <-max(simdf$time)
support <- expand.grid(time = seq(start, end, length = 400),
group = factor(unique(simdf$group)))
out<-gam_diff(model=m, newdata=support, g1="a", g2="b",
byVar = "group", smoothVar="time", plot=T, patch=F)gam_diff predictionsgam_diff differencesgrowthSSAny of these models shown can be specified easily using growthSS.
growthSS - formWith a model specified a rough formula is required to parse your data to fit the model.
The layout of that formula is:
outcome ~ time|individual/group
growthSS - form 2Here we would use y~time|id/group
growthSS - form 3We can check that the grouping in our formula is correct with a plot.
growthSS - sigma“nlme” and “brms” models accept a sigma argument. Here we will only look at nlme models as brms models are the subject of the Advanced Growth Modeling tutorial.
growthSS - sigmaRecall the heteroskedasticity problem, shown again with our simulated data:
growthSS - sigmaThere are lots of ways to model a trend like that we see for sigma.
pcvr offers three options through growthSS for nlme models.
growthSS - “none” sigmaVariance can be modeled as homoskedastic by group.
growthSS - “power” sigmaVariance can be modeled using a power of the x term.
growthSS - “exp” sigmaVariance can be modeled using a exponent of the x term.
growthSS - varFunc Optionsnlme varFunc objects can also be passed to the sigma argument in growthSS.
See ?nlme::varClasses for details.
growthSS - startOne of the useful features in growthSS is that you do not need to specify starting values for the supported non-linear models (double sigmoid options notwithstanding).
growthSS - tauFinally, with mode=“nlrq” the tau argument takes one or more quantiles to fit a model for. By default this is 0.5, corresponding to the median.
growthSS - nlsgrowthSS - nlrqgrowthSS - nlmegrowthSS - mgcvgrowthSS - survival modelssurv_ss <- growthSS(model="survival weibull",
form = y>100~time|id/group,
df=simdf, type = "survreg") # type = "flexsurv" has more distribution options
lapply(surv_ss, class)$df
[1] "data.frame"
$formula
[1] "formula"
$pcvrForm
[1] "formula"
$distribution
[1] "character"
$type
[1] "character"
$model
[1] "character"
$call
[1] "call"
growthSS - survival modelssurv_ss <- growthSS(model="survival weibull",
form = y>100~time|id/group,
df=simdf, type = "survreg") # type = "flexsurv" has more distribution options
lapply(surv_ss, class)$df
[1] "data.frame"
$formula
[1] "formula"
$pcvrForm
[1] "formula"
$distribution
[1] "character"
$type
[1] "character"
$model
[1] "character"
$call
[1] "call"
Note that here we still supply the standard phenotype data but give a cutoff on the left hand side of the formula. That cutoff represents the “event”. For example, given area in pixels germination might be area_px>10 ~ time|id/group.
fitGrowthNow that we have the components for our model from growthSS we can fit the model with fitGrowth.
fitGrowth 2With non-brms models the steps to fit a model specified by growthSS are very simple and can be left to fitGrowth.
fitGrowth 3Additional arguments can be passed to fitGrowth and are used as follows:
| type | … |
|---|---|
| nls | passed to stats::nls |
| nlrq | cores to run in parallel, passed to quantreg::nlrq |
| nlme | passed to nlme::nlmeControl |
| mgcv | passed to mgcv::gam |
growthPlotModels fit by fitGrowth can be visualized using growthPlot.
growthPlot - nlsgrowthPlot - nlrqgrowthPlot - nlmegrowthPlot - mgcvgrowthPlot - survHypothesis testing for frequentist non-linear growth models can be somewhat limited.
Broadly, testing is implemented for all backends by comparing models against nested null models and for select backends non-linear testing is available using testGrowth.
testGrowth - nlsAnalysis of Variance Table
Model 1: y ~ A/(1 + exp((B[group] - time)/C[group]))
Model 2: y ~ A[group]/(1 + exp((B[group] - time)/C[group]))
Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
1 995 201112
2 994 168892 1 32220 189.63 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
testGrowth - nls 2testGrowth - nlrqHere we only print out the comparison for the model of the 49th percentile, but all taus are returned.
Model 1: y ~ A[group]/(1 + exp((B[group] - time)/C[group]))
Model 2: y ~ A/(1 + exp((B[group] - time)/C[group]))
#Df LogLik Df Chisq Pr(>Chisq)
1 6 -3971.0
2 5 -4101.7 -1 261.51 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
testGrowth - nlmetestGrowth - nlme 2testGrowth(nls_ss, nlme_fit, test = list("A.groupa - A.groupb",
"B.groupa - (B.groupb*1.25)",
"(C.groupa+1) - C.groupa")) Form Estimate SE Z-value p-value
1 A.groupa - A.groupb 39.202093 4.8025053 8.162842 3.272318e-16
2 B.groupa - (B.groupb*1.25) -1.806234 0.2039578 8.855920 8.299624e-19
3 (C.groupa+1) - C.groupa 1.000000 0.0000000 Inf 0.000000e+00
testGrowth - mgcvDue to GAMs nature we cannot test parameters individually.
Analysis of Deviance Table
Model 1: y ~ s(time)
Model 2: y ~ 0 + group + s(time, by = group)
Resid. Df Resid. Dev Df Deviance F Pr(>F)
1 992.53 311323
2 985.19 169322 7.3344 142001 112.91 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
testGrowth - survThe flexsurv backend provides more flexibility, but standard survreg models are tested using survival::survdiff
If you run into a novel situation please reach out and we will try to come up with a solution and add it to pcvr if possible.
Good ways to reach out are the help-datascience slack channel and pcvr github repository.